feat: non-local redundancy #4491

nugaon · 2023-12-04T14:44:04Z

Along with the neighborhood reserve replication and node caching, Reed-Solomon erasure coding and dispersed replicas make the redundancy in Bee.

From these new redundancies, the client can recover requested data from different neighborhoods of the kademlia network.

Details in the PR descriptions

Open API Spec Version Changes (if applicable)

I think we need to change the version after #4529

pkg/encryption/encryption.go

openapi/Swarm.yaml

openapi/SwarmCommon.yaml

pkg/api/api.go

pkg/api/bzz.go

pkg/api/dirs.go

pkg/replicas/putter.go

pkg/replicas/replicas.go

ldeffenb

Just noting the points that make no sense as I try to understand the upcoming redundancy improvements.

pkg/api/dirs.go

pkg/api/api.go

pkg/api/dirs.go

istae · 2023-12-20T13:45:07Z

pkg/api/bzz.go

+		Cache                 *bool           `map:"Swarm-Cache"`
+		Strategy              getter.Strategy `map:"Swarm-Redundancy-Strategy"`
+		FallbackMode          bool            `map:"Swarm-Redundancy-Fallback-Mode"`
+		ChunkRetrievalTimeout time.Duration   `map:"Swarm-Chunk-Retrieval-Timeout"`


what is the purpose of this?

@istae Configuring this counts as an advanced dev option.
By setting chunk retrieval timeout to a suffienctly short interval one can effectively render some slow chunks not retrievable and thus 'simulate' chunk loss even in environments of perfect availability.
This is especially useful since it allows for client side trigger of redundancy-based recovery of data implemented in this PR.

istae · 2023-12-20T13:48:07Z

pkg/api/bzz.go

+	ctx = getter.SetStrategy(ctx, headers.Strategy)
+	ctx = getter.SetStrict(ctx, headers.FallbackMode)
+	ctx = getter.SetFetchTimeout(ctx, headers.ChunkRetrievalTimeout)
+	reader, l, err := joiner.New(ctx, s.storer.Download(cache), s.storer.Cache(), reference)


storer.Cache() is not a reliable source for storing chunks for any operation, what is the purpose of using the cache here?

what should I put here? there is no other putter available under s.storer

what is the purpose of using the cache here?

so cache must be used to store things locally that have no postage stamp. There are three ways to get there:

evicted from reserve (either too distant, too cheap or expired)

landed with us through retrieval and has no postage stamp (yet)

and now also created with the help of parities, no postage stamp available (yet).

Lets get to the bottom of this.

Reconstructed chunks have no postage stamp therefore they cannot be in the reserve. Putting them in a pinstore is wasteful and unclear in terms of expiry.
The only possible place to put is the localstore cache.

The status of reconstructed chunks is similar to chunks obtained through retrieval which is also cached.
The only thing we absolutely need to make sure of is that these cached chunks can and will be put in the reserve once they are offered by peers as part of pullsyncing as a chunk with a valid postage stamp.

There is one caveat about using the cache is the intricate scenario described and resolved in this hackmdQq https://hackmd.io/@zelig/Bkqol64dp

An even worse consideration is that the cache size is user-specified and can therefore be set to an arbitrarily small value, possibly ensuring that the reconstructed chunks have already been purged before they are needed again.

I am introducing a recoder accordingly
resolved in #4529

pkg/file/redundancy/level.go

mrekucci · 2023-12-20T14:20:13Z

pkg/replicas/getter.go

+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// the code below implements the integration of dispersed replicas in chunk fetching.


Still unresolved.

mrekucci · 2023-12-20T14:22:41Z

pkg/replicas/putter.go

+	wg.Wait()
+	close(errc)
+	for err := range errc {
+		errs = append(errs, err)


Still unresolved.

pkg/replicas/putter.go

pkg/replicas/replicas.go

pkg/file/redundancy/level.go

…#4453) Co-authored-by: Viktor Levente Tóth <[email protected]>

pkg/api/api.go

openapi/SwarmCommon.yaml

pkg/api/bzz.go

pkg/file/pipeline/hashtrie/hashtrie_test.go

pkg/file/redundancy/getter/getter.go

pkg/file/redundancy/getter/strategies.go

istae · 2024-01-09T15:00:23Z

pkg/file/redundancy/getter/strategies.go

+	}
+	defer cancelAll()
+	run := func(s Strategy) error {
+		if s == PROX { // NOT IMPLEMENTED


what happens if strategy is DATA? because the strategy is incremented below, then for cases where the strategy is DATA, no other strategy is tried because PROX is not implemented ?

istae · 2024-01-09T15:03:09Z

pkg/file/redundancy/getter/strategies.go

+
+		var stop <-chan time.Time
+		if s < RACE {
+			timer := time.NewTimer(strategyTimeout)


max time allowed to fetch all the chunks from the network is 500 ms?

pkg/file/redundancy/getter/strategies.go

pkg/file/redundancy/getter/getter.go

istae · 2024-01-10T14:28:39Z

pkg/file/redundancy/getter/getter.go

+			continue
+		default:
+		}
+		_ = g.fly(i, true) // commit (RS) or will commit to retrieve the chunk


why does missing call set the chunks in flight?

istae · 2024-01-10T14:38:28Z

pkg/file/redundancy/getter/getter.go

+// Get will call parities and other sibling chunks if the chunk address cannot be retrieved
+// assumes it is called for data shards only
+func (g *decoder) Get(ctx context.Context, addr swarm.Address) (swarm.Chunk, error) {
+	i, ok := g.cache[addr.ByteString()]


is this cache even necessary?
would the GET call receive an addr not part of this decoder??

haha, so the addr that the Get receives is the address of a chunk within the scope of its parent (packed address chunk = intermediate chunk).
The decoders for every parent scope is cached in the joiner.
This cache here, is actually an index mapping addresses (children of the parent) to position.
Should be renamed probably

istae · 2024-01-10T14:39:59Z

pkg/file/redundancy/getter/getter.go

+	if !ok {
+		return nil, storage.ErrNotFound
+	}
+	if g.fly(i, true) {


the prefetch calls essentially sets all of these chunks to inflight, no?
why is this fly check necessary, why can't we jump to the select below?

because we want singleflight behaviour on fetching, ie if prefetching fetches a chunk,, then its queried with joiner Get --> decoder Get (or the other way round), then we should just wait on the inflight fetch.
Similarly, when prefetch fetched shardCnt chunks the other chunks can be put to inflight, so that if they are `Get-ed' by the joiner they are not fetched but wait till revovered.

And of course there is not always a prefetch on every data chunk (NONE and PROX will not prefetch some chunks)

istae · 2024-01-10T15:03:23Z

pkg/file/redundancy/getter/getter.go

+	// if all chunks are retrieved, signal ready
+	n := g.fetchedCnt.Add(1)
+	if n == int32(g.shardCnt) {
+		close(g.ready) // signal that just enough chunks are retrieved for decoding


are we allowed to close ready if the shardCnt includes parity chunks?

istae · 2024-01-10T15:17:47Z

pkg/file/redundancy/getter/getter.go

+
+// Get will call parities and other sibling chunks if the chunk address cannot be retrieved
+// assumes it is called for data shards only
+func (g *decoder) Get(ctx context.Context, addr swarm.Address) (swarm.Chunk, error) {


what this Get should ideally do is to wait for either the chunk channel g.waits[i] to finish OR wait for a signal that the recovery has finished (and the chunk is available).
there should also be an error channel that returns when a chunk is not fetchable AND unrecoverable which would translate into a storage.ErrNotFound.

as it stands, this GET nevers properly returns an error, and simply waits for a context timeout in this case that recovery and/or fetch failed.

Co-authored-by: nugaon <[email protected]> Co-authored-by: Anatol <[email protected]> Co-authored-by: dysordys <[email protected]> Co-authored-by: Gyorgy Barabas <[email protected]>

will review after conflicts are resolved

This reverts commit 0ece898.

bee-runner bot added the pull-request label Dec 4, 2023

nugaon force-pushed the feat/redundancy branch from cacdbd6 to 6c049c4 Compare December 4, 2023 15:37

nugaon marked this pull request as ready for review December 4, 2023 18:32

notanatol reviewed Dec 5, 2023

View reviewed changes

pkg/encryption/encryption.go Outdated Show resolved Hide resolved

nugaon changed the title ~~feat: redundancy~~ feat: kademlia redundancy Dec 5, 2023

mrekucci reviewed Dec 5, 2023

View reviewed changes

ldeffenb suggested changes Dec 12, 2023

View reviewed changes

pkg/api/dirs.go Outdated Show resolved Hide resolved

pkg/api/api.go Outdated Show resolved Hide resolved

nugaon force-pushed the feat/redundancy branch 2 times, most recently from a83b3f7 to a63e161 Compare December 18, 2023 19:43

nugaon changed the title ~~feat: kademlia redundancy~~ feat: non-local redundancy Dec 19, 2023

nugaon force-pushed the feat/redundancy branch from 2331ef0 to 8e3837c Compare December 20, 2023 09:53

istae requested changes Dec 20, 2023

View reviewed changes

mrekucci reviewed Dec 20, 2023

View reviewed changes

nugaon and others added 17 commits December 20, 2023 17:51

feat: erasure encoder (#4429)

1e65ba0

feat: erasure decoder (#4448)

efc54cd

fix: data race

a5987fa

feat(replicas): new replicas pkg for redundancy by dispersed replicas (…

e0fbcf7

…#4453) Co-authored-by: Viktor Levente Tóth <[email protected]>

refactor: reviews

5d1b73c

fix: typeo

485b5ac

fix: @ldeffenb review

4243730

feat: replicas integration (#4492)

7574a96

test(fix): hashtrie replicacount race condition

e94f81c

refactor: replicas add

6ba286b

feat(redundancy/getter): rs decoder rewrite (#4507)

2c6911f

test: set level in context

0820d73

test: not full chunk in joiner

845b68b

test: raise strategy timeout

4307af3

feat(redundancy): api headers integration (#4515)

3da49a2

test: bzz api

a9ba136

fix: sorry

139783a

nugaon and others added 4 commits December 20, 2023 17:51

feat: dispersed replica validation (#4522)

eea6099

refactor: reviews

7ed51db

fix: split cmd pipelineBuilder call

2306fb0

docs: fix openapi

107f947

nugaon force-pushed the feat/redundancy branch from 2521f8a to 107f947 Compare December 20, 2023 17:08

docs: openapi fallbackmodeparam

5cc9eef

acha-bill reviewed Dec 21, 2023

View reviewed changes

zelig assigned zelig and nugaon Jan 6, 2024

istae reviewed Jan 9, 2024

View reviewed changes

pkg/file/redundancy/getter/strategies.go Outdated Show resolved Hide resolved

pkg/file/redundancy/getter/getter.go Outdated Show resolved Hide resolved

istae reviewed Jan 10, 2024

View reviewed changes

zelig force-pushed the feat/redundancy branch from 4a81fc3 to 5cc9eef Compare January 19, 2024 09:19

notanatol approved these changes Feb 5, 2024

View reviewed changes

feat: redundancy retrieve api (#4529)

e3d4d5a

Co-authored-by: nugaon <[email protected]> Co-authored-by: Anatol <[email protected]> Co-authored-by: dysordys <[email protected]> Co-authored-by: Gyorgy Barabas <[email protected]>

acha-bill previously approved these changes Feb 7, 2024

View reviewed changes

fix: merge master

517a3ff

zelig force-pushed the feat/redundancy branch from 0f674da to 517a3ff Compare February 7, 2024 18:21

istae approved these changes Feb 7, 2024

View reviewed changes

acha-bill approved these changes Feb 7, 2024

View reviewed changes

istae merged commit 0ece898 into master Feb 8, 2024
12 checks passed

istae deleted the feat/redundancy branch February 8, 2024 10:15

istae added a commit that referenced this pull request Feb 8, 2024

Revert "feat: non-local redundancy (#4491)"

9e77bd8

This reverts commit 0ece898.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: non-local redundancy #4491

feat: non-local redundancy #4491

nugaon commented Dec 4, 2023 •

edited by istae

Loading

ldeffenb left a comment

istae Dec 20, 2023

zelig Jan 4, 2024

istae Dec 20, 2023

nugaon Dec 20, 2023

istae Dec 21, 2023

zelig Dec 21, 2023

zelig Jan 5, 2024

ldeffenb Jan 5, 2024

zelig Jan 20, 2024

mrekucci Dec 20, 2023

mrekucci Dec 20, 2023

istae Jan 9, 2024

istae Jan 9, 2024 •

edited

Loading

istae Jan 10, 2024

istae Jan 10, 2024

zelig Jan 15, 2024

istae Jan 10, 2024

zelig Jan 15, 2024

zelig Jan 15, 2024

istae Jan 10, 2024

zelig Jan 15, 2024

istae Jan 10, 2024 •

edited

Loading

feat: non-local redundancy #4491

feat: non-local redundancy #4491

Conversation

nugaon commented Dec 4, 2023 • edited by istae Loading

Open API Spec Version Changes (if applicable)

ldeffenb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

istae Jan 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

istae Jan 10, 2024 • edited Loading

Choose a reason for hiding this comment

nugaon commented Dec 4, 2023 •

edited by istae

Loading

istae Jan 9, 2024 •

edited

Loading

istae Jan 10, 2024 •

edited

Loading